Multi-type attribute index for a document database

ABSTRACT

Multi-type attribute indexes may be implemented for document databases. When a request to perform a query at a document database is received that is directed toward in indexed attribute that has multiple data types for values stored in the multi-type attribute index, a predicate in the query may be replaced with a different predicate that is applicable to search the multi-type attribute index according to a sort order for the multiple data types stored in the at multi-type attribute index. A plan that includes the different predicate may be performed in order to provide a result of the query to a user.

BACKGROUND

Indexing schemes provide data management systems, such as databases,with fast querying capabilities. Instead of scanning an entire data set,the data management system may apply search criteria, such as querypredicates, to the evaluation of an index, which may be optimized to mapthe location of data satisfying the search criteria to portions of theindex. However, semi-structured data, such as data found in documentdatabases, NoSQL, or other non-relational data storage systems, can bemore difficult to index as the relationships between data may not bestrictly defined according to a common structure. Indexing techniquesthat can provide faster performance for semi-structured data aretherefore highly desirable.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a logical block diagram illustrating a multi-type attributeindex for a document database, according to some embodiments.

FIG. 2 is a logical block diagram illustrating a provider network thatimplements a database service that offers a document database, accordingto some embodiments.

FIG. 3 is a logical block diagram illustrating a query engine and indexmanager for implementing a multi-type attribute index for a documentdatabase, according to some embodiments.

FIG. 4 is an example illustration of a multi-type attribute index,according to some embodiments.

FIG. 5 is a high-level flowchart illustrating various methods andtechniques to implement a multi-type attribute index for a documentdatabase, according to some embodiments.

FIG. 6 is a high-level flowchart illustrating various methods andtechniques to generate a query plan for a multi-type attribute index,according to some embodiments.

FIG. 7 is a high-level flowchart illustrating various methods andtechniques to create a multi-type attribute index, according to someembodiments.

FIG. 8 is a block diagram illustrating an example computing system,according to some embodiments.

While embodiments are described herein by way of example for severalembodiments and illustrative drawings, those skilled in the art willrecognize that the embodiments are not limited to the embodiments ordrawings described. It should be understood, that the drawings anddetailed description thereto are not intended to limit embodiments tothe particular form disclosed, but on the contrary, the intention is tocover all modifications, equivalents and alternatives falling within thespirit and scope as defined by the appended claims. The headings usedherein are for organizational purposes only and are not meant to be usedto limit the scope of the description or the claims. As used throughoutthis application, the word “may” is used in a permissive sense (i.e.,meaning having the potential to), rather than the mandatory sense (i.e.,meaning must). Similarly, the words “include”, “including”, and“includes” mean including, but not limited to.

DETAILED DESCRIPTION

The systems and methods described herein implement multi-type attributeindex for a document database, according to some embodiments. Indexingtechniques organize data in order improve the performance or searches orqueries directed to the data, in various embodiments. An index may, forinstance, identify locations (or store the data therein), desired in aquery in a structure, such as b-tree, that can be efficiently searchedto discover all of the values that satisfy the query, in someembodiments. Multi-type attribute indexes may provide an index structurethat provides a highly performant search structure for data sets, suchas a document database, with multiple types of data possibly stored torepresent the same value, in various embodiments, instead of limitingdifferent data types to separately maintained indexes or translating orreformatting data into a common data type. In this way, multi-typeattribute indexes can reduce system overhead from maintaining separateindexes for different data types of the same attribute, reduceprocessing costs of the database system to maintain and update theindexes, and other index management challenges that can increaseaccording to the number of indexes maintained.

FIG. 1 is a logical block diagram illustrating a multi-type attributeindex for a document database, according to some embodiments. Documentdatabase 110 may store data that is non-relational or semi-structured invarious embodiments. For example, document database 110 may storedocuments that specify various attributes which may be shared with otherdocuments (e.g., different documents have “attribute a”, although theymay have different values) or attributes which are exclusive to aparticular document (e.g., only one document has “attribute z”). Evendocuments that share attributes may have different data types for values(e.g., string, integer, array, timestamp, or custom data types, amongother data types). Some data types may have multiple values, asdiscussed below with regard to FIG. 4.

Multi-type attribute index 130 may be created for an attribute ofdocuments stored in document database 110. Different data types, such asdata types 142, 144, 146, and 148 may have values, such as attributevalues 132 a, 132 b, 132 c, 132 d, 132 e, 132 f, 132 g, 132 h, 132 i andso on, stored as entries (e.g., leaf nodes in a b-tree) as part ofmulti-type attribute index 130. Multi-type attribute index 130 may becreated and maintained using a variety of different index structureformats, such as tree-based structures like a b-tree or hash-basedstructures, among others. Multi-type attribute index 130 may storeentries according to a sort order for data types 140 which allows forcomparisons across data types to be made. If the sort order, for examplewere array<string<integer, then any array attribute value would be lessthan any string value. Implementing sort order for data types 140 mayallow for type bracketing (e.g., restricting queries to searching forthose types of data values specified in the query predicates), in someembodiments, as discussed below with regard to FIGS. 4-6.

Document database 110 may implement a query engine 120 to provide accessto documents stored in document database 110. Queries, such as query102, may include one or more predicate(s) 104, which may identifyattribute values of documents that a user or system that submitted query102 may wish to identify. Query engine 120 may interpret the query 102and may recognize if the query 102 includes a predicate 104 that isdirected to an attribute that has been indexed, such as the attributefor multi-type attribute index 130. For such queries, query engine 120may generate a query plan that replaces the predicate 104 in the queryfor the attribute with a different predicate, such as search predicates124. In this way, the appropriate portion of the multi-type attributeindex 130 may be searched. For instance, if the query 102 has apredicate 104 that identifies a range of integer values (e.g., attributea<integer 2000 and a>integer 1499), then search predicates may ensurethat the predicates for the attribute are not applied to other datatypes (e.g., strings, timestamps, etc.). As discussed below with regardto FIGS. 4-6, in some embodiments a search predicate may add Booleanoperations to limit the search of multi-type attribute index 130 to thedata type in the query predicate using the sort order for data types. Insome embodiments, the comparison operation may be overloaded orotherwise mapped to a comparison operation specific to the data type ofthe query predicate (e.g., a string comparator for strings, an arraycomparator for arrays, and so on).

Search 122 may be performed according to the query plan generated byquery engine 120 using search predicate(s) 124 in place of thepredicates 104 that were directed to the indexed attribute. In this way,searches of the multi-type attribute index 130 may be directed tolocations storing entries that may include data of the type indicate inpredicates 104. Results, such as result 106, from the search may includethe document identifiers 108 of those documents that have an attributevalue that satisfies the predicate.

Please note that previous descriptions of a multi-type attribute indexare not intended to be limiting, but are merely provided as logicalexamples. Different implementations of a document database, queryengine, or multi-type attribute index, may be conceived.

This specification begins with a general description of a providernetwork that may implement a database service that may implement amulti-type attribute index for a document database, in one embodiment.Then various examples of a database service are discussed, includingdifferent components/modules, or arrangements of components/module, thatmay be employed as part of implementing the database service, in oneembodiment. A number of different methods and techniques to implementmulti-type attribute index for a document database are then discussed,some of which are illustrated in accompanying flowcharts. Finally, adescription of an example computing system upon which the variouscomponents, modules, systems, devices, and/or nodes may be implementedis provided. Various examples are provided throughout the specification.

FIG. 2 is a logical block diagram illustrating a provider network thatimplements a database service that offers a document database, accordingto some embodiments. Provider network 200 may be a private or closedsystem, in one embodiment, or may be set up by an entity such as acompany or a public sector organization to provide one or more services(such as various types of cloud-based storage) accessible via theInternet and/or other networks to clients 250, in another embodiment. Inone embodiment, provider network 200 may be implemented in a singlelocation or may include numerous data centers hosting various resourcepools, such as collections of physical and/or virtualized computerservers, storage devices, networking equipment and the like (e.g.,computing system 1000 described below with regard to FIG. 8), needed toimplement and distribute the infrastructure and storage services offeredby the provider network 200. In one embodiment, provider network 200 mayimplement various computing resources or services, such as databaseservice 210 or other data processing (e.g., relational or non-relational(NoSQL) database query engines, map reduce processing, data warehouse,data flow processing, and/or other large scale data processingtechniques), data storage services (e.g., an object storage service,block-based storage service, or data storage service that may storedifferent types of data for centralized access), virtual computeservices, and/or any other type of network based services (which mayinclude various other types of storage, processing, analysis,communication, event handling, visualization, and security services notillustrated).

In various embodiments, the components illustrated in FIG. 2 may beimplemented directly within computer hardware, as instructions directlyor indirectly executable by computer hardware (e.g., a microprocessor orcomputer system), or using a combination of these techniques. Forexample, the components of FIG. 2 may be implemented by a system thatincludes a number of computing nodes (or simply, nodes), in oneembodiment, each of which may be similar to the computer systemembodiment illustrated in FIG. 8 and described below. In one embodiment,the functionality of a given system or service component (e.g., acomponent of database service(s) 210) may be implemented by a particularnode or may be distributed across several nodes. In some embodiments, agiven node may implement the functionality of more than one servicesystem component (e.g., more than one data store component).

Database service(s) 210 may include various types of database services,in one embodiment, (such as various kinds of non-relational and/or NoSQLdatabases) for storing, querying, and updating data. In at least someembodiments, database service 210 may provide a document database,storing documents (e.g., JavaScript Object Notation (JSON) documents ina serializable format, such as Application Specific Object Notation(ASON) or Binary JSON (BSON). Such database services 210 may beenterprise-class database systems that are highly scalable andextensible. In one embodiment, queries may be directed to a database indatabase service(s) 210 that is distributed across multiple physicalresources, and the database system may be scaled up or down on an asneeded basis. The database system may work effectively with databaseschemas of various types and/or organizations, in different embodiments.In one embodiment, clients/subscribers may submit queries in a number ofways, e.g., interactively via a query language interface to the databasesystem. In other embodiments, external applications and programs maysubmit queries using driver interfaces to the database system.

In one embodiment, clients 250 may encompass any type of clientconfigurable to submit network-based requests to provider network 200via network 260, including requests for database service(s) 210 (e.g.,to query a document database 210 that uses a multi-type attributeindex). For example, in one embodiment a given client 250 may include asuitable version of a web browser, or may include a plug-in module orother type of code module may execute as an extension to or within anexecution environment provided by a web browser. Alternatively in adifferent embodiment, a client 250 may encompass an application such asa database application (or user interface thereof), a media application,an office application or any other application that may make use ofstorage resources in data storage service(s) to store and/or access thedata to implement various applications. In one embodiment, such anapplication may include sufficient protocol support (e.g., for asuitable version of Hypertext Transfer Protocol (HTTP)) for generatingand processing network-based services requests without necessarilyimplementing full browser support for all types of network-based data.That is, client 250 may be an application may interact directly withprovider network 200, in one embodiment. In one embodiment, client 250may generate network-based services requests according to aRepresentational State Transfer (REST)-style network-based servicesarchitecture, a document- or message-based network-based servicesarchitecture, or another suitable network-based services architecture.

In one embodiment, a client 250 may provide access to provider network200 to other applications in a manner that is transparent to thoseapplications. For example, client 250 may integrate with a database ondatabase service(s) 210. In such an embodiment, applications may notneed to be modified to make use of the storage system service model.Instead, the details of interfacing to the database service(s) 210 maybe coordinated by client 250.

Clients 250 may convey network-based services requests to and receiveresponses from provider network 200 via network 260, in one embodiment.In one embodiment, network 260 may encompass any suitable combination ofnetworking hardware and protocols necessary to establishnetwork-based-based communications between clients 250 and providernetwork 200. For example, network 260 may encompass the varioustelecommunications networks and service providers that collectivelyimplement the Internet. In one embodiment, network 260 may also includeprivate networks such as local area networks (LANs) or wide areanetworks (WANs) as well as public or private wireless networks. Forexample, both a given client 250 and provider network 200 may berespectively provisioned within enterprises having their own internalnetworks. In such an embodiment, network 260 may include the hardware(e.g., modems, routers, switches, load balancers, proxy servers, etc.)and software (e.g., protocol stacks, accounting software,firewall/security software, etc.) necessary to establish a networkinglink between given client 250 and the Internet as well as between theInternet and provider network 200. It is noted that in one embodiment,clients 250 may communicate with provider network 200 using a privatenetwork rather than the public Internet.

Database service 210 may implement request routing 202, in oneembodiment. Request routing may receive, authenticate, parse, throttleand/or dispatch service requests, among other things, in one embodiment.In one embodiment, database service 210 may implement control plane 220to implement one or more administrative components, such as automatedadmin instances which may provide a variety of visibility and/or controlfunctions, as described in more detail herein). In one embodiment,database service 210 may also implement multiple storage nodes 230, eachof which may manage documents 242 of a data set (e.g., a database) onbehalf of clients/users or on behalf of the database service (and itsunderlying system) which may be stored in storage 236 (on storagedevices attached to storage nodes 230) or, in another embodiment, inexternal storage which may be accessed by storage nodes 230 (e.g., vianetwork connections) (not illustrated).

Control plane 220 may provide visibility and control to systemadministrators, detect split events for storage nodes, and/or anomalycontrol, resource allocation, in one embodiment. In one embodiment,control plane 220 may also include an admin console, through whichsystem administrators may interact with the data storage service (and/orthe underlying system). In one embodiment, the admin console may be theprimary point of visibility and control for the data storage service(e.g., for configuration or reconfiguration by system administrators).For example, the admin console may be implemented as a relatively thinclient that provides display and control functionally to systemadministrators and/or other privileged users, and through which systemstatus indicators, metadata, and/or operating parameters may be observedand/or updated. Control plane 220 may provide an interface or access toinformation stored about one or more detected control plane events, suchas split requests to be processed, at storage service 230, in oneembodiment.

Control plane 220 may direct the performance of different types ofcontrol plane operations among the nodes, systems, or devicesimplementing database service 210, in one embodiment. For instance,control plane 220 may communicate with storage nodes to initiate theperformance of various control plane operations, such as moves, splits,update documents, delete documents, create indexes, etc. . . . In oneembodiment, control plane 220 may update a task registry (or some othertable or data structure) with the status, state, or performanceinformation of the control plane operations currently being performed.

In one embodiment, request routing 202 may support handling requestsformatted according to an interface to support different types of webservices requests. For example, in one embodiments, database service 210may implement a particular web services application programminginterface (API) that supports a variety of operations on tables (orother data objects) that are maintained and managed on behalf ofclients/users by the data storage service system (and/or data stored inthose tables). In one embodiment, database service 210 may supportdifferent types of web services requests. For example, in oneembodiments, database service 210 may implement a particular webservices application programming interface (API) that supports a varietyof operations on documents (or other data objects) that are maintainedand managed on behalf of clients/users by the database service 201(and/or data stored in those documents). In one embodiment, requestrouting 202 may perform parsing and/or throttling of service requests,authentication and/or metering of service requests, dispatching servicerequests, and/or maintaining a partition assignments that map processingnodes to partitions.

Storage nodes 230 may implement database management 232, in oneembodiment. Database management 232 may create, update, define, query,and/or otherwise administer databases, in one embodiment. For instance,database management 232 may maintain a database according to a databasemodel (e.g., a non-relational database model). In one embodiment,database management 232 may allow a client to manage data definitions(e.g., Data Definition Language (DDL) requests to describe attributes,requests to add item attributes, etc.). In one embodiment, databasemanagement 232 may handle requests to access the data (e.g., to insert,modify, add, or delete data as well as requests to query for data bygenerating query execution plans to determine which partitions of adatabase may need to be evaluated or searched in order to service thequery). In one embodiment, database management 232 may also performother management functions, such as enforcing access controls orpermissions, concurrency control, or recovery operations. In oneembodiment, database management 232 may send requests to storage engine234 to access documents 242 in order to process queries (e.g., requeststo obtain the identify of documents with attributes specified in a querypredicate).

In one embodiment, storage nodes 230 may implement storage engine 234 toaccess storage 236. Storage engine 234 may perform requests on behalf ofdatabase management to create, read, update and delete (CRUD) data in apartition, in one embodiment. Storage engine 234 may implement buffers,caches, or other storage components to reduce the number of timesstorage is accessed, in one embodiment. Storage engine 234 may implementvarious storage interfaces to access storage 236. In those embodimentswhere external storage is a network-based data storage service, likeanother data storage service in provider network 200 in FIG. 2, thenstorage engine 234 may establish a network connection with the serviceas part of obtaining access to a storage unit (e.g., by submit requestsformatted according to a protocol or API to establish the connection).In another embodiment, storage engine 234 may access internal storageusing storage protocols (e.g., Small Computer Systems Interface (SCSI))over a bus or other interconnect that directly connects a hostimplementing storage engine 234 with storage 236).

In one embodiment, database service 210 may provide functionality forcreating, accessing, and/or managing a document database at storagenodes within a single-tenant environment than those that providefunctionality for creating, accessing, and/or managing tables maintainedin nodes within a multi-tenant environment. In another embodiment,functionality to support both multi-tenant and single-tenantenvironments may be included in any or all of the components illustratedin FIG. 3. Note also that in one embodiment, one or more storage nodes230 process queries and other access requests on behalf of clientsdirected to tables. Some of these storage nodes may operate as if theywere in a multi-tenant environment, and others may operate as if theywere in a single-tenant environment. In one embodiments, storage nodes230 that operate as in a multi-tenant environment may be implemented ondifferent storage nodes (or on different virtual machines executing on asingle host) than storage nodes that operate as in a single-tenantenvironment.

In at least some embodiments, the systems underlying the databaseservice 210 described herein may store data on behalf of storage serviceclients (e.g., client applications, users, and/or subscribers) indocuments containing one or more attributes (e.g., a JavaScript ObjectNotation document). In some embodiments, database service 210 maypresent clients/users with a data model in which each documentmaintained on behalf of a client/user contains one or more attributes.The attributes of a document may be a collection of name-value pairs, inany order. In some embodiments, each attribute in an item may have aname, a type, and a value. Some attributes may be single valued, suchthat the attribute name is mapped to a single value, while others may bemulti-value, such that the attribute name is mapped to two or morevalues. In some embodiments, the name of an attribute may always be astring, but its value may be a string, number, string set, or numberset. The following are all examples of attributes: “ImageID”=1,“Title”=“flower”, “Tags”={“flower”, “jasmine”, “white”}, “Ratings”={3,4, 2}. The documents may be managed by assigning each document a primarykey value (which may include one or more attribute values), and thisprimary key value may also be used to uniquely identify the document. Insome embodiments, a large number of attributes may be defined across thedocuments in a table, but each document may contain a sparse set ofthese attributes (with the particular attributes specified for onedocument being unrelated to the attributes of another document in thesame table), and all of the attributes may be optional except for theprimary key attribute(s). In other words, the document maintained by thedatabase service 210 (and the underlying storage system) may have nopre-defined schema other than their reliance on the primary key. Notethat in some embodiments, if an attribute is included in a document, itsvalue cannot be null or empty (e.g., attribute names and values cannotbe empty strings), and, and within a single document, the names of itsattributes may be unique.

FIG. 3 is a logical block diagram illustrating a query engine and indexmanager for implementing a multi-type attribute index for a documentdatabase, according to some embodiments. Database management 232 mayimplement index management 350, in various embodiments, to create,maintain, and remove multi-type attribute indexes for a documentdatabase. For example, a request to create (or delete) an attributeindex 360 may be handled by index management 350. Index management 350may parse the request to identify the attribute used to create theattribute index. Index management 350 may allocate or obtain storage forthe index (e.g., via storage engine 234 in FIG. 2) and issue storagerequest to create or update the attribute index 364. For example, asdiscussed below with regard to FIG. 7, index management 350 may scan thedocuments of the document database to search for the attribute specifiedin the index, utilize comparison operations for comparing values of thesame data type and for comparing values across data types using the sortorder for data types (as discussed above with regard to FIG. 1 and belowwith regard to FIG. 4) to create entries in an index for the attributevalues. Also discussed below with regard to FIG. 4, in some embodiments,multi-value attributes may be stored in the attribute index both as theindividual data values of the multi-value attribute, and as the completemulti-value attribute. In this way, queries with a predicate that may besatisfied by one or more individual values of a multi-value can beperformed without violating type bracketing restrictions, in someembodiments. Index management 350 may notify query engine 310 of theexistence of multi-type attribute indexes for the document database inorder to utilize the multi-attribute value index (e.g., instead ofperforming a heap or other brute force scan) to perform a query.

Database management 232 may implement query engine 310, in variousembodiments, to handle queries directed to the document database, like aquery on an indexed attribute 370, in some embodiments. Query engine 310may implement query parsing and validation 320, some embodiments. Query370 may be received, for example and checked for valid form, references,attributes, predicates, operators, etc. In some embodiments queryparsing and validation 320 may check for a query implicitly specifying amulti-value attribute by including multiple conflicting predicate values(e.g., attribute a<7 and attribute a>11), and allowing the query toproceed if multi-value attributes have been stored. Similarly, requeststo perform data manipulation, such as requests to update an indexedattribute 374 (or document) may be received, which may be likewiseparsed, planned, and executed by query engine 310.

Query parsing and validation 320 may generate a parse tree or otherparsed form or output of a query, which may identify predicates and therespective attributes of the predicates. In this way, attributes forwhich a multi-type attribute index exists may be identified to queryplanning and optimization 330. Query engine 310 may implement queryplanning and optimization 330 to generate a query plan or otherinstructions to perform a query, in some embodiments. In at least someembodiments, the query plan may be optimized (e.g., for cost). Costoptimization may include identifying indexes as structures for quicklylocating desired data in a query without performing a scan of an entiredatabase. Therefore, query planning and optimization 330 may identifypredicates in a query, such as query 370, which are directed to anattribute for which an index exists because the cost of utilizing such amulti-type attribute index may be lower than other operations to performthe query. In some embodiments, queries may be directed explicitly to anindex, as if it were a secondary index or materialized view of thedocument database.

Query planning and optimization 330 may implement multi-type/multi-valuepredicate rewriting 332 to replace predicates received in a query withpredicates that can be applied to search the appropriate portions of anindex, in some embodiments. Type bracketing, for instance, as discussedbelow with regard to FIGS. 5 and 6, may be implemented by replace apredicate with a single Boolean operation (e.g., a<4) with a predicatethat includes another comparison to a minimum or maximum value of aneighboring data type according to the sort order of for data types inthe index (e.g., a<4 AND a>minimum string value, where strings<integersin the sort order). Predicate rewriting may also be performed to replacea Boolean expression within an overloaded operator specific to the typeof data included in the predicate, in some embodiments. If, forinstance, the query indicates a Boolean expression where attributea<“Smith”, then multi-type/value predicate rewriting may rewrite thepredicate in the plan to invoke a string comparison operation forevaluating the expression with respect to values in the multi-typeattribute index. Other operations, such as filtering operations forduplicate documents, or determining whether to include multiplepredicates in the index evaluation, as discussed below with regard toFIG. 6, may be implemented by query planning and optimization 330, insome embodiments.

Query engine 310 may implement query execution platform 340 to performthe generated query plan, in various embodiments. Query executionplatform may issue storage requests 380 to perform the query accordingto the plan 380 (which may be received and interpreted by storage engine234 in order to access the data the underlying storage). Storageresponses 382 may be received and evaluated to determine whetherattribute values satisfy query 370 predicate(s). In at least someembodiments, caching or other buffering techniques may be implemented tostore, for example, all or a portion of an index in memory for quickaccess. Query execution engine 340 may perform operations to join,filter, aggregate, order, or other predicates or operations specified orcaused by queries, such as query 370. Query engine 310 may then providethe results 372 of the query, in some embodiments. Results 372 may be,for instance, a list of document identifiers which store the attributeand satisfy the query, in some embodiments.

When other statements handled by query engine 310, such as insertions ordeletions of documents or attributes included therein (e.g., request374), query engine 310 may provide index updates 362 in addition toperforming the storage requests to achieve the requested change in thedocument database, in some embodiments. For example, index update 362may notify index management to insert a new attribute value for anindexed attribute.

FIG. 4 is an example illustration of a multi-type attribute index,according to some embodiments. Attribute index 400 may be a tree-basedindex structure, such as a b-tree index, in some embodiments. A rootnode 410 may specify the various attribute value ranges for child nodes,which may in turn specify the respect value ranges of the child nodesuntil leaf nodes, such as leaf nodes 420 a, 420 b, 420 c, and 420 d arereached. Insertions (and deletions) from attribute index may be madeaccording to a sort order for data types 440. For example, the sortorder 440 for data types may specify thatnumbers<strings<arrays<timestamps<Booleans and so on. Insertions ordeletions that compare a value of the attribute that is one data typewith a value of another data type may utilize sort order 440 todetermine relative placement of the value with respect to that otherdata type. Additional, sort orders or comparisons within a data type maybe used to compare values of the same data type, such as data type xsort order 432, data type y sort order 434, and data type z sort order436.

One of the data types attribute index 400 may store is multi-value datatypes such as arrays, objects, expressions or other multi-valueinformation that is considered a single value for that attribute in adocument. In order to leverage the same index search, scan or evaluationfor performing queries that is used for attribute index 402, multi-valueattributes may be decomposed so that queries that are directed tosearching for one or more values that may be present in the multi-valueattributed (e.g., a query to return a document with at least one elementin an array that satisfies one or more conditions).

As illustrated in FIG. 4, a multi-value attribute 402 may be insertedinto index attribute 400. Attribute 402 (attribute a) may be an arraythat includes values of different data types, x, y, and z[x,x] (whichmay be a multi-value attribute, like an array, within the array).Insertion of the multi-value attribute may separate the values of thearray into singular or scalar elements or values, in some embodiments.In this way, a query that searches for at least one value of amulti-value data type that includes a value that satisfies a conditionin predicate may be evaluated by evaluating the individual scalar valuesin addition to the multi-value data type (which may allow for featureslike type-bracketing as discussed below with regard to FIGS. 5 and 6 tobe enforced) For instance, leaf node 420 a may store (in a portion ofthe index storing other type x values, attribute a=“x” and the documentid of the document including attribute a. Similarly, leaf node 420 b maystore attribute a=“y” and the same document id, leaf node 420 c maystore the attribute value a=“z[x,x]”, and leaf node 420 d may store theentire multi-value attribute a=“[x, y, z[x,x]].” If a query directed toattribute “a” as was performed, then the individual attribute values maybe evaluated at the different leaf nodes 420 (including both theindividual values and the multi-value) to determine whether a predicatewas met.

The examples of multi-type attribute index for document databases as inFIGS. 2-4 have been given in regard to a database service (e.g., anon-relational or NoSQL database service that stores documents on behalfof a client of the service). However, various other types ofnon-relational or NoSQL database systems or data processing systems thatprovide a document database may implement a multi-type attribute index,in other embodiments. FIG. 5 is a high-level flowchart illustratingvarious methods and techniques to implement a multi-type attribute indexfor a document database, according to some embodiments. Thesetechniques, as well as the techniques discussed with regard to FIGS.6-7, may be implemented using components or systems as described abovewith regard to FIGS. 2-4, as well as other types of databases, storageengines, systems, or clients and thus the following discussion is notintended to be limiting as to the other types of systems that mayimplement the described techniques.

As indicated at 510, a request to perform a query at a document databasethat includes a first predicate for an attribute of documents stored inthe document database that has values n include multiple data types maybe received, in various embodiments. The query may be specifiedaccording to various languages (e.g., Structured Query Language(SQL)—which may be translated to other protocols or languages),protocols, or programmatic interfaces (e.g., API calls to search, look,find, get, etc.), in some embodiments. The query may include one or morepredicates which describe desired data to be returned, in someembodiments. For example, a predicate may identify or specify anattribute of documents to search for with a desired value or valuesusing Boolean operations. As discussed above with regard to FIG. 1, theoperations for comparing attribute values with respect to the predicatemay include determining whether a value of the attribute being evaluatedwith respect to the predicate satisfies the predicate as true using <,<=, =, >, >=. In addition to the comparison, the predicate may alsospecify the data type of the values of the attribute to be compared(e.g., strings, integers, Booleans, arrays, etc.).

As indicated at 520, the first predicate may be replaced in a plan toperform the query with a second predicate applicable to search an indexfor the attribute according to a sorted order for the data types of thevalues of the attribute in the index. For example, the first predicatemay only specify one data type for comparing the values of theattribute, as noted above. The second predicate may be a form of thefirst predicate that is rewritten to amend, append, or otherwise accountfor other data types for the attribute that may be encountered in theindex. For example, additional or conjunctive comparisons made withrespect to other data types based on the specified comparison and datatype may be determined and included. If, for instance, the comparison isfor an integer data type equal to a value of “12345” then additionalcomparisons may be added to search for a string data type with a valueof “12345”. In some embodiments, query planners and/or optimizers thatare responsible for generating the query plan for the query mayintelligently identify what other comparison operations to include(e.g., using text predication or other machine learning techniques torecognize equivalent data stored in other data types).

In some embodiments, the query may specify that only the data typespecified in the query predicate be consider (sometimes referred to astype-bracketing). Type bracketing may be automatically applied, in someembodiments, such as is discussed below with regard to FIG. 6.Comparisons may be added as conjunctive part of the second predicate(e.g., joined by “and”) to exclude the other data types, in someembodiments. The sorted order of data types may allow for comparisons toexclude ranges of values based on the sorted order so that if, forinstance, all array types are greater in value than all integer datatypes (e.g., a sort order of string integer array), and an integer datatype is specified in the predicate, then the second predicate mayinclude a comparison that specifies all values for the attribute lessthan the minimum array value (and greater than the maximum string value,depending on whether a search would be performed in the index for valuesless than the minimum value of the specified data type).

Once the first predicate is replaced with the second predicate, thequery may be performed according to the plan, as indicated at 530, invarious embodiments. The query plan may be executed or caused to executeby a query engine for the document database, including performingvarious storage operations (e.g., via a storage engine) to scan, search,or otherwise evaluate various portions of the index using the secondpredicate. For tree-based index structures, such as a b-tree, comparisonoperators for performing the second predicate may be used to navigatedifferent branches of the tree structure, in some embodiments.Identified attribute values that satisfy the predicate may also beassociated with or co-located with an identifier, link, or location to adocument that stores the attribute, in some embodiments. As indicated at540, a result of the query may be returned to a client, in variousembodiments. The result may include the identifier, link, or location ofthe documents, in some embodiments, associated with attribute valuesthat satisfy the second predicate. The result may be formatted accordingto a same interface, protocol, or language as the query, in someembodiments.

FIG. 6 is a high-level flowchart illustrating various methods andtechniques to generate a query plan for a multi-type attribute index,according to some embodiments. As indicated at 610, planning for a querymay be initiated to access an index of an attribute of a documentdatabase with multiple data types of values in the index, in variousembodiments. In some embodiments, predicate(s) of a query may bevalidated, as indicated at 620. If, for example, the query specifies anincorrect attribute name, unsupported data type, or an invalidcomparison, then the predicate may be invalid. In some embodiments,predicates that would be considered invalid for databases that onlystore one value per attribute may be allowed. For example, if a query isreceived with a predicate that specifies “a<3 AND a>8” the query maystill be valid for attributes that store an array of values including avalue less than 3 and a value larger than 8. In some embodiments,metadata or other information for an index may identify whether amulti-value attribute has been stored in the index. If not, then thequery predicate may be considered invalid in scenarios that could onlybe satisfied by a multi-value attribute. For those queries with aninvalid predicate, an error may be returned in response to the query, asindicated at 622, in some embodiments.

Because multiple predicates directed to the indexed attribute may beincluded in a query, in some embodiments, different optimizations ortechniques for handling multiple predicates when evaluating a multi-typeattribute index. As indicated by the positive exit from 630, adetermination may be made as to whether a multi-value attribute has beenstored in the index, as indicated at 640, in some embodiments. Forexample, if an array for the attribute has been stored in one of theindex entries, then metadata indicating that a multi-value attribute hasbeen stored may be flagged, set, or otherwise updated to indicate thepresence of a multi-value attribute in the index. If no multi-valueattribute has been stored in the index, then as indicated at 642, thenmultiple predicates may be included in a portion of the plan to evaluatethe index, in some embodiments. In this way, the processing ofintermediate results from the index based on a single predicate can beeliminated as each predicate may be evaluated for an entry of the index.

As indicated by the positive exit from 640, when multi-value attributesare stored in the index, one of the predicates is included in a portionof the plan to evaluate the index, while the non-selected predicates maybe applied to the results of the selected predicate, as indicated at650, in various embodiments. In this way, a multi-value attribute storedas scalar elements and as a whole (as discussed above with regard toFIG. 4) may not be improperly excluded from a result of a query.Consider an example where an attribute value is an array, “[1, 10, 20].”If the array were stored in the index separate (as “1,” “10,” “20,” and“[1, 10, 20]”), then passing two predicates (e.g., a<“8” and a>“4” wouldnot return any matching results (as none of the single values couldsatisfy both predicates and the predicates may not perform thecomparison of number to array if type bracketing is enabled). If insteadthe predicate a<“8” were selected, then the returned attribute valuecould be also compared to see if it evaluated true using the remainingpredicate “a<4,” (which it would because “1” of the array values is lessthan “4”). As indicated at 632, if multiple predicates directed to theindexed attribute are not included then, the predicate may be includedin a portion of the plan to evaluate the index.

As indicated at 660, other predicate(s) to exclude data type(s) of theattribute may be included, in some embodiments. If, for instance, thedata type of the query bracket is a string, then Boolean comparisonsaccording to the sorted order for data types may be used as predicatesto exclude other data types by including a predicate that specifies thatthe value be less than (and/or greater than) the minimum or maximumvalue of the adjacent type of data, in some embodiments. In this way,multiple types of data can be stored in the same index, preserving asorted order in order to make the index searchable while retaining thedifference in data type.

As indicated at 670, in some embodiments, an operation to filterduplicate documents returned from an evaluation of the index may beadded to the plan. For example, because a multi-value attribute may bealso stored as individual scalar values, multiple of the scalar valuesmay satisfy a predicate. However, only 1 indication of a document shouldbe returned as the scalar values are not representative of independentvalues but part of a composite value. Filtering out duplicate documentsmay prevent returning an erroneous result of multiple documentssatisfying a query when only 1 document satisfies the query, forinstance.

As indicated at 680, a plan for the query based on the includedpredicates and operation may be finalized, in various embodiments.Instructions to perform index evaluations may be generated for theincluded predicates, residual or intermediate processing predicates maybe instructed to be performed upon intermediate results of indexevaluations, and filter operations may be performed upon query resultsprior to returning the results to the client, in the plan, in someembodiments. In some embodiments, duplicate documents may be filtered byperforming a bitmap index scan.

FIG. 7 is a high-level flowchart illustrating various methods andtechniques to create a multi-type attribute index, according to someembodiments. As indicated at 710, a request may be received to create anindex for an attribute of a document database, in some embodiments. Therequest may be received according to a programmatic interface (e.g., acreate index API), script, language, statement or other protocol. Therequest may identify the attribute of the document database, in someembodiments. For example, the request may specify an attribute name orother identifier.

As indicated at 720, the document database may be scanned to obtainvalues of the attribute from documents of the document database.Documents, as discussed above with regard to FIG. 1, may be stored in aformat that can be serialized or read, such as BSON or ASON. Scanningthe documents may include accessing the documents, reading the documentsin the stored format to look for the identified attribute in thedocument. If the attribute is present, the value may be retrieved alongwith an identifier for the document for insertion into the index.

As the values are obtained (or once all of the values are obtained, thevalues of the attribute may be inserted into the index using comparisonoperations that implement a sorted order for data types of theattribute, as indicated at 730, in various embodiments. For example,each value of the same data type may be compared with other values inthe index of the same data type using comparisons, <, <=, =, >, >=, forthe data type (e.g., string comparisons, array comparisons, timestampcomparisons, etc.). In addition to comparing the value of the attributewith those values stored in the same type, the value may also becompared with those values of other data types using the sorted orderfor data types (e.g., string→timestamp→array→integer). Insertiontechniques may depend on the index structure being used. For instance,b-tree insertion techniques may be performed to make comparisons of avalue to be inserted according to the value ranges mapped to child nodesstarting from the root node to perform an insertion at a leaf node (andany needed splitting or rebalancing that may occur as a result).

Completion of the index may be acknowledged, as indicated at 740, invarious embodiments. The acknowledgment may be specified according tothe same format or interface as the request was submitted, in someembodiments. Similar techniques for updating the index may be performedas creating the index, inserting or removing (or marking for deletion orto be ignored) those values of the attribute that are removed from thedocument database (e.g., due to an edit to the document or removal ofthe document), in some embodiments.

The methods described herein may in various embodiments be implementedby any combination of hardware and software. For example, in oneembodiment, the methods may be implemented by a computer system (e.g., acomputer system as in FIG. 8) that includes one or more processorsexecuting program instructions stored on a computer-readable storagemedium coupled to the processors. The program instructions may implementthe functionality described herein (e.g., the functionality of variousservers and other components that implement the distributed systemsdescribed herein). The various methods as illustrated in the figures anddescribed herein represent example embodiments of methods. The order ofany method may be changed, and various elements may be added, reordered,combined, omitted, modified, etc.

Embodiments of multi-type attribute indexes for document databases asdescribed herein may be executed on one or more computer systems, whichmay interact with various other devices. One such computer system isillustrated by FIG. 8. In different embodiments, computer system 1000may be any of various types of devices, including, but not limited to, apersonal computer system, desktop computer, laptop, notebook, or netbookcomputer, mainframe computer system, handheld computer, workstation,network computer, a camera, a set top box, a mobile device, a consumerdevice, video game console, handheld video game device, applicationserver, storage device, a peripheral device such as a switch, modem,router, or in general any type of computing node or compute node,computing device or electronic device.

In the illustrated embodiment, computer system 1000 includes one or moreprocessors 1010 coupled to a system memory 1020 via an input/output(I/O) interface 1030. Computer system 1000 further includes a networkinterface 1040 coupled to I/O interface 1030, and one or moreinput/output devices 1050, such as cursor control device, keyboard, anddisplay(s). Display(s) may include standard computer monitor(s) and/orother display systems, technologies or devices, in one embodiment. Insome embodiments, it is contemplated that embodiments may be implementedusing a single instance of computer system 1000, while in otherembodiments multiple such systems, or multiple nodes making up computersystem 1000, may host different portions or instances of embodiments.For example, in one embodiment some elements may be implemented via oneor more nodes of computer system 1000 that are distinct from those nodesimplementing other elements.

In various embodiments, computer system 1000 may be a uniprocessorsystem including one processor 1010, or a multiprocessor systemincluding several processors 1010 (e.g., two, four, eight, or anothersuitable number). Processors 1010 may be any suitable processor capableof executing instructions, in one embodiment. For example, in variousembodiments, processors 1010 may be general-purpose or embeddedprocessors implementing any of a variety of instruction setarchitectures (ISAs), such as the x86, PowerPC, SPARC, or MIPS ISAs, orany other suitable ISA. In multiprocessor systems, each of processors1010 may commonly, but not necessarily, implement the same ISA.

In some embodiments, at least one processor 1010 may be a graphicsprocessing unit. A graphics processing unit or GPU may be considered adedicated graphics-rendering device for a personal computer,workstation, game console or other computing or electronic device, inone embodiment. Modern GPUs may be very efficient at manipulating anddisplaying computer graphics, and their highly parallel structure maymake them more effective than typical CPUs for a range of complexgraphical algorithms. For example, a graphics processor may implement anumber of graphics primitive operations in a way that makes executingthem much faster than drawing directly to the screen with a host centralprocessing unit (CPU). In various embodiments, graphics rendering may,at least in part, be implemented by program instructions for executionon one of, or parallel execution on two or more of, such GPUs. TheGPU(s) may implement one or more application programmer interfaces(APIs) that permit programmers to invoke the functionality of theGPU(s), in one embodiment.

System memory 1020 may store program instructions 1025 and/or dataaccessible by processor 1010, in one embodiment. In various embodiments,system memory 1020 may be implemented using any suitable memorytechnology, such as static random access memory (SRAM), synchronousdynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type ofmemory. In the illustrated embodiment, program instructions and dataimplementing desired functions, such as those described above are shownstored within system memory 1020 as program instructions 1025 and datastorage 1035, respectively. In other embodiments, program instructionsand/or data may be received, sent or stored upon different types ofcomputer-accessible media or on similar media separate from systemmemory 1020 or computer system 1000. A computer-accessible medium mayinclude non-transitory storage media or memory media such as magnetic oroptical media, e.g., disk or CD/DVD-ROM coupled to computer system 1000via I/O interface 1030. Program instructions and data stored via acomputer-accessible medium may be transmitted by transmission media orsignals such as electrical, electromagnetic, or digital signals, whichmay be conveyed via a communication medium such as a network and/or awireless link, such as may be implemented via network interface 1040, inone embodiment.

In one embodiment, I/O interface 1030 may coordinate I/O traffic betweenprocessor 1010, system memory 1020, and any peripheral devices in thedevice, including network interface 1040 or other peripheral interfaces,such as input/output devices 1050. In some embodiments, I/O interface1030 may perform any necessary protocol, timing or other datatransformations to convert data signals from one component (e.g., systemmemory 1020) into a format suitable for use by another component (e.g.,processor 1010). In some embodiments, I/O interface 1030 may includesupport for devices attached through various types of peripheral buses,such as a variant of the Peripheral Component Interconnect (PCI) busstandard or the Universal Serial Bus (USB) standard, for example. Insome embodiments, the function of I/O interface 1030 may be split intotwo or more separate components, such as a north bridge and a southbridge, for example. In addition, in some embodiments some or all of thefunctionality of I/O interface 1030, such as an interface to systemmemory 1020, may be incorporated directly into processor 1010.

Network interface 1040 may allow data to be exchanged between computersystem 1000 and other devices attached to a network, such as othercomputer systems, or between nodes of computer system 1000, in oneembodiment. In various embodiments, network interface 1040 may supportcommunication via wired or wireless general data networks, such as anysuitable type of Ethernet network, for example; viatelecommunications/telephony networks such as analog voice networks ordigital fiber communications networks; via storage area networks such asFibre Channel SANs, or via any other suitable type of network and/orprotocol.

Input/output devices 1050 may, in some embodiments, include one or moredisplay terminals, keyboards, keypads, touchpads, scanning devices,voice or optical recognition devices, or any other devices suitable forentering or retrieving data by one or more computer system 1000, in oneembodiment. Multiple input/output devices 1050 may be present incomputer system 1000 or may be distributed on various nodes of computersystem 1000, in one embodiment. In some embodiments, similarinput/output devices may be separate from computer system 1000 and mayinteract with one or more nodes of computer system 1000 through a wiredor wireless connection, such as over network interface 1040.

As shown in FIG. 8, memory 1020 may include program instructions 1025,that implement the various embodiments of the systems as describedherein, and data store 1035, comprising various data accessible byprogram instructions 1025, in one embodiment. In one embodiment, programinstructions 1025 may include software elements of embodiments asdescribed herein and as illustrated in the Figures. Data storage 1035may include data that may be used in embodiments. In other embodiments,other or different software elements and data may be included.

Those skilled in the art will appreciate that computer system 1000 ismerely illustrative and is not intended to limit the scope of theembodiments as described herein. In particular, the computer system anddevices may include any combination of hardware or software that canperform the indicated functions, including a computer, personal computersystem, desktop computer, laptop, notebook, or netbook computer,mainframe computer system, handheld computer, workstation, networkcomputer, a camera, a set top box, a mobile device, network device,internet appliance, PDA, wireless phones, pagers, a consumer device,video game console, handheld video game device, application server,storage device, a peripheral device such as a switch, modem, router, orin general any type of computing or electronic device. Computer system1000 may also be connected to other devices that are not illustrated, orinstead may operate as a stand-alone system. In addition, thefunctionality provided by the illustrated components may in someembodiments be combined in fewer components or distributed in additionalcomponents. Similarly, in some embodiments, the functionality of some ofthe illustrated components may not be provided and/or other additionalfunctionality may be available.

Those skilled in the art will also appreciate that, while various itemsare illustrated as being stored in memory or on storage while beingused, these items or portions of them may be transferred between memoryand other storage devices for purposes of memory management and dataintegrity. Alternatively, in other embodiments some or all of thesoftware components may execute in memory on another device andcommunicate with the illustrated computer system via inter-computercommunication. Some or all of the system components or data structuresmay also be stored (e.g., as instructions or structured data) on acomputer-accessible medium or a portable article to be read by anappropriate drive, various examples of which are described above. Insome embodiments, instructions stored on a computer-readable mediumseparate from computer system 1000 may be transmitted to computer system1000 via transmission media or signals such as electrical,electromagnetic, or digital signals, conveyed via a communication mediumsuch as a network and/or a wireless link. This computer readable storagemedium may be non-transitory. Various embodiments may further includereceiving, sending or storing instructions and/or data implemented inaccordance with the foregoing description upon a computer-accessiblemedium. Accordingly, the present invention may be practiced with othercomputer system configurations.

Various embodiments may further include receiving, sending or storinginstructions and/or data implemented in accordance with the foregoingdescription upon a computer-accessible medium. Generally speaking, acomputer-accessible medium may include storage media or memory mediasuch as magnetic or optical media, e.g., disk or DVD/CD-ROM,non-volatile media such as RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.),ROM, etc., as well as transmission media or signals such as electrical,electromagnetic, or digital signals, conveyed via a communication mediumsuch as network and/or a wireless link.

The various methods as illustrated in the Figures and described hereinrepresent example embodiments of methods. The methods may be implementedin software, hardware, or a combination thereof. The order of method maybe changed, and various elements may be added, reordered, combined,omitted, modified, etc.

Various modifications and changes may be made as would be obvious to aperson skilled in the art having the benefit of this disclosure. It isintended that the invention embrace all such modifications and changesand, accordingly, the above description to be regarded in anillustrative rather than a restrictive sense.

What is claimed is:
 1. A system, comprising: a memory to store programinstructions which, if performed by at least one processor, cause the atleast one processor to perform a method to at least: receive a requestto perform a query at a document database that includes a firstpredicate for an attribute of documents stored in the document database,wherein different respective values of the attribute include a pluralityof data types; generate a plan to perform the query that replaces thefirst predicate with a second predicate applicable to search an indexfor the attribute according to a sorted order for the data types of thedifferent respective values in the index; perform the query according tothe plan; and return a result of the query to a client.
 2. The system ofclaim 1, wherein the program instructions cause the at least oneprocessor to perform the method to further: receive a request to createthe index for the attribute; in response to the receipt of the requestto create the index: scan the documents of the document database toobtain the respective values from the documents of the documentdatabase; and insert the respective values into the index usingcomparison operations that implement the sorted order for the datatypes.
 3. The system of claim 2, wherein at least one of the respectivevalues is a multi-value attribute, and wherein to insert the respectivevalues into the index, the program instructions cause the at least oneprocessor to perform the method to: insert individual values of themulti-value attribute in separate entries of the index; and insert themulti-value attribute in another entry of the index.
 4. The system ofclaim 1, wherein the at least one processor and the memory areimplemented as part of a network-based database service, wherein thedocument database is stored in the database service on behalf of aclient of the database service, and wherein the query is received via anApplication Programming Interface (API) for the network-based service.5. A method, comprising: receiving a request to perform a query at adocument database that includes a first predicate for an attribute ofdocuments stored in the document database, wherein different respectivevalues of the attribute include a plurality of data types; replacing thefirst predicate in a plan to perform the query with a second predicateapplicable to search an index for the attribute according to a sortedorder for the data types of the different respective values in theindex; performing the query according to the plan; and returning aresult of the query to a client.
 6. The method of claim 5, wherein thesecond predicate includes a predicate that excludes one or more of thedata types according to the sort order for the data types.
 7. The methodof claim 5, further comprising updating the index to include anadditional value of the attribute stored in the document database usingone or more comparison operations that implement the sorted order forthe data types.
 8. The method of claim 5, wherein performing the queryaccording to the query plan, comprises: determining that the index doesnot store a multi-value attribute; and response to the determining,including an additional predicate with the second predicate to beapplied when searching the index.
 9. The method of claim 5, whereinperforming the query according to the query plan comprises: determiningthat the index does store a multi-value attribute; and response to thedetermining, applying an additional predicate to intermediate resultsfrom the search of the index applying the second predicate.
 10. Themethod of claim 5, wherein the plan further comprises an operation toremove duplicate documents returned from an evaluation of the index. 11.The method of claim 5, further comprising: receiving a request to createthe index for the attribute; in response to receiving the request tocreate the index: obtaining the respective values from the documents ofthe document database; and inserting the respective values into theindex using comparison operations that implement the sorted order forthe data types.
 12. The method of claim 11, wherein at least one of therespective values is a multi-value attribute, and wherein inserting therespective values into the index comprises: inserting individual valuesof the multi-value attribute in separate entries of the index; andinserting the multi-value attribute in another entry of the index. 13.The method of claim 11, wherein the index is a b-tree, and whereininserting the respective values into the index comprises identifyingrespective leaf nodes for the respective values according to thecomparison operations.
 14. A non-transitory, computer-readable storagemedium, storing program instructions that when executed by one or morecomputing devices cause the one or more computing devices to implement:receiving a request to perform a query at a document database thatincludes a first predicate for an attribute of documents stored in thedocument database, wherein different respective values of the attributeinclude a plurality of data types; rewriting the first predicate toreplace the first predicate with a second predicate a second predicateapplicable to search an index for the attribute according to a sortedorder for the data types of the different respective values in theindex; including the second predicate as part of a plan to perform thequery; performing the query according to the plan; and returning aresult of the query to a client.
 15. The non-transitory,computer-readable storage medium of claim 14, wherein the programinstructions cause the one or more computing devices to furtherimplement: receiving a request to create the index for the attribute; inresponse to receiving the request to create the index: scanning thedocuments of the document database to obtain the respective values fromthe documents of the document database; and inserting the respectivevalues into the index using comparison operations that implement thesorted order for the data types.
 16. The non-transitory,computer-readable storage medium of claim 15, wherein at least one ofthe respective values is a multi-value attribute, and wherein, ininserting the respective values into the index, the program instructionscause the one or more computing devices to implement: insertingindividual values of the multi-value attribute in separate entries ofthe index; and inserting the multi-value attribute in another entry ofthe index.
 17. The non-transitory, computer-readable storage medium ofclaim 16, wherein in performing the query according to the plan, theprogram instructions cause the one or more computing devices toimplement applying the second predicate to the separate entries of theindividual values in the index and the other entry of themulti-attribute value of the index.
 18. The non-transitory,computer-readable storage medium of claim 14, wherein the secondpredicate includes a predicate that excludes one or more of the datatypes according to the sort order for the data types.
 19. Thenon-transitory, computer-readable storage medium of claim 14, wherein,in performing the query according to the query plan, the programinstructions cause the one or more computing devices to implement:determining that the index does not store a multi-value attribute; andresponse to the determining, including an additional predicate with thesecond predicate to be applied when searching the index.
 20. Thenon-transitory, computer-readable storage medium of claim 14, whereinthe one or more computing devices are implemented as part of anetwork-based database service, wherein the document database is storedin the database service on behalf of a client of the database service,and wherein the query is received via an Application ProgrammingInterface (API) for the network-based service.